Calling for Student Submissions: AI Safety Distillation Contest
At EA UC Berkeley, we’re launching an ongoing series of contests called the Artificial Intelligence Misalignment Solutions (AIMS) series. This second contest, the Distillation Contest, is now open to any student enrolled in a university/college: here are our interest and submission forms! The contest has prizes as large as $2,500 and closes on May 20th. This blog post restates the information that is on our website, with a bit more explanation of the contest’s purpose.
A huge thank you to Akash for creating the infrastructure and support that allow this project to launch!
This competition is for distillations of posts, papers, and research agendas. For short-form arguments for the importance of AI safety, see the AI Safety Arguments Competition.
Purpose
AIMS Series
I think that it is currently difficult for university students to find tangible ways to engage with AI Safety. Generally, by creating a series of AI Safety contests, I hope to:
Help build social capital for students who are interested in Alignment and potentially good at it.
Create ways for people to test their fit for Alignment work.
Create a “brand” around my contests over time so that CS students recognize its name and winners recommend the contests to their friends. Hopefully, this name recognition would also increase the ability to create partnerships with CS orgs as well.
For this specific contest, I’m inspired by the arguments that the field of AI Alignment needs more distillers to improve communication within the field, as well as to make their research accessible to a wider audience. The Distillation Contest aims to produce value by:
Recruiting CS students who have never heard of EA or Alignment before (I will be doing this outreach at UC Berkeley through advertising, but other organizers are welcome to advertise to their own groups for recruitment).
Increasing the engagement of students who are already interested in Alignment.
Potentially producing useful distillations of Alignment research and increasing accessibility to said research.
Contest description:
The Distillation Contest asks that participants:
1) Pick an article/post/research paper on AI Alignment/Safety (ideally from our list below) that would benefit from being more clearly explained.
2) Indicate which ideas or sections of their chosen research should be distilled. Applicants can either distill a whole post/article, a specific part of the post/article, or multiple posts/articles.
3) Create a distillation: a clearer explanation of the research, along with a new example or new application of the research.
4) Optionally: If there is a problem that is trying to be solved by the research you’re distilling, you can attempt to create an additional solution to the problem and include it in your response.
What makes a good distillation?
A good distillation would explain the most confusing part of another piece of writing – the use of distillation is found in creating new ways to understand confusing concepts or confusing technical writing. These distillations would also help readers infer how the distilled ideas relate to other Alignment research. Because of this, creating a good distillation will likely require participants to read related research outside of their distilled post in order to make sure they fully understand the ideas presented in the paper.
As an example of a great distillation, Holden Karnofsky, after creating the Most Important Century Series, created a roadmap to make the series more digestible and navigable. Additionally, Scott Alexander has distilled multiple complex dialogues (and even a meme) in order to make them more accessible.
Posts/articles that we would encourage applicants to choose for the Distillation Contest to distill include the following list. Applicants are allowed to propose their own posts/articles outside of this list, although it’s possible that the judges will not believe that those articles are convoluted enough to need distillation. Therefore, it’s recommended that applicants distill from the list below. (This list may change over time.)
Technical research papers from the Alignment Fundamentals Curriculum. Especially the optional readings
Richard Ngo’s AGI Safety from First Principles sequence
Evan Hubinger’s Risks from Learned Optimization sequence
John Wentworth posts (see the first comment here):
The Pointers Problem: Human Values are a Function of Human Latent Variables
Debates about how to think about outer alignment and inner alignment (here, here, and here).
Eliciting Latent Knowledge technical report
Prizes
$2,500 - One prize available for 1st place submission.
$1,250 - One prize available for 2nd place submission.
$500 - Up to 5 prizes available.
$250 - Up to 10 prizes available.
All prize winners’ names will be posted on the EA Berkeley website and selected distillations will be optionally posted to the website.
Scoring
Distillations will be scored on the following factors:
Depth of understanding
Clarity of presentation
Rigor of work
Concision/Length (longer papers will need to present more information than shorter papers)
Originality of insight
Accessibility
Preference may be given to distillations that:
Synthesize multiple sources
Increase the ease of access for the distillation to be an introduction to a topic
Final Notes
There are a few other purposes to this contest that I did not list above but may write about in a future forum post! There are also likely some great articles that should be distilled in addition to the collection of the current list of recommended articles to distill (which were chosen by Akash Wasil). If you have any top recommendations for articles you’d like to be distilled, I may make additions to our existing list so that applicants have a higher chance of distilling that article.
Finally, since the contest is open to all students, please feel free to share our contest information with university students you know! Here is a link to our current advertising material for other organizers to distribute if they’d like.
- [$20K in Prizes] AI Safety Arguments Competition by 26 Apr 2022 16:13 UTC; 75 points) (LessWrong;
- [$20K In Prizes] AI Safety Arguments Competition by 26 Apr 2022 16:21 UTC; 71 points) (
- $20K In Bounties for AI Safety Public Materials by 5 Aug 2022 2:52 UTC; 71 points) (LessWrong;
- Eliciting Latent Knowledge (ELK) - Distillation/Summary by 8 Jun 2022 13:18 UTC; 69 points) (LessWrong;
- AI safety university groups: a promising opportunity to reduce existential risk by 30 Jun 2022 18:37 UTC; 53 points) (
- $20K in Bounties for AI Safety Public Materials by 5 Aug 2022 2:57 UTC; 45 points) (
- Speedrun: AI Alignment Prizes by 9 Feb 2023 11:55 UTC; 27 points) (
- Pitfalls with Proofs by 19 Jul 2022 22:21 UTC; 19 points) (LessWrong;
- Would Structured Discussion Platforms for EA Community Building Ideas be Valuable? (With Prototype Example) by 4 May 2022 18:49 UTC; 15 points) (
- AI safety university groups: a promising opportunity to reduce existential risk by 1 Jul 2022 3:59 UTC; 14 points) (LessWrong;
- 7 May 2022 1:56 UTC; 9 points) 's comment on SERI ML Alignment Theory Scholars Program 2022 by (
Dan Hendrycks and I would love for somebody to distill some of his papers! https://arxiv.org/abs/2008.02275 https://arxiv.org/abs/2110.13136
Hi, are PhD students also allowed to submit? I would like to submit a distillation and would be fine with not receiving any money in case I win a prize. In case this complicates things too much, I could understand if you don’t want that.
Hi! I’ve been thinking about this a bit more and I do think I want graduate students to be able to submit! However, since the main audience is meant to be undergraduate students, I may have to be harsher in evaluation or, more excitingly, maybe I could create a new tier for graduate students? For now I’d say feel free to submit and I’ll work out more specifics on my end and make an edit (+ reply to this) if I make official changes!
That sounds very reasonable. Thanks for the swift reply.
This is exciting, I really like this idea, and I’m glad it’s being put into action. Do you know who your judges are? I don’t have any technical knowledge myself, so I’m not speaking from inside view, but one concern I have is whether it might be relatively easy to write distillations that seem good but contain subtle misunderstandings that it would be hard for someone not in the field to catch but matter (certainly in my conversations with friends who know more than me, getting to talk through my amateur understanding of the discord chats has resulted in demonstrating some important ignorances on my part).
Great point! Early on, I had someone more connected than me make a list of potential judges. We have 15 names brainstormed and sectioned off by how much they know about alignment. I can say with pretty high certainty that I imagine we will at least have someone whose full-time job is alignment reading the submissions (likely a person with a CS doctorate), but hopefully, we could get even more expertise :)
Awesome! Will you be releasing the list at some point?
Also, if you’re still on the lookout for more judges, I can potentially send people your way! If not, great!
I wasn’t anticipating releasing the list (in some part because people may try to pander to a certain judge’s background and in some part to allow myself and the judges more flexibility in adding people last second).
Sending some judge recommendations my way would be great! I think having a variety of readers would be helpful :) Thank you!
Really cool, just last week I was thinking about whether the alignment community should (massively) scale up prizes with relatively low barriers to entry!
Having you considered making this bigger? E.g. with more prices and a more active outreach to other universities?
I initially thought that ideally every contribution that clears a certain bar should be rewarded accordingly, that way there’s less uncertainty about payoffs and more people will contribute
I think you likely could find more texts to recommend, but even duplicated distillations are still valuable for getting students into thinking about alignment research and identifying particularly promising candidates
Evaluation time is a likely bottleneck, but probably you could find a handful of e.g. AGI Safety Fundamentals alumni to volunteer a few hours, or many more if you offer compensation for helping out
Thank you! These are thoughtful comments! I think I will try to add more texts and find more readers, as you suggest.
I’ve been thinking of going into working on creating contests in the future as a potentially serious work project, so I hope to create some contests that can be larger scale then! Right now, I’m rather limited in capacity. Thankfully, I’m connected with some other great university organizers who I’ve let know about advertising at their schools.
I think it would be tricky to have clear baseline cutoffs for distillation that still capture quality since writing varies so much between people. Do you have any ideas of clear cutoffs that would retain quality (for future contests if nothing else)?
You probably already have seen that the contest was featured on AstralCodexTen, so you might get more obviously good submissions than you have prices for and it would kinda feel like a wasted opportunity to not clearly signal (i.e. with money) to those authors that their work is highly appreciated and that we would love for them to do more of this work.
Nice and nice! :)
Hmm, is your worry that distillations that in hindsight seem to be fairly sub-optimal (e.g. with major mistakes or confusing explanations) end up receiving the lowest tier price because there is some noise introduced by the people who rate the distillations? I think this might happen only rarely, for maybe 2 in 100 distillations? I think your list of scoring criteria already goes a long way giving raters a good idea for what solid work looks like. The money for the lowest tier would also not be a lot, maybe 200$. Giving a price to in-hindsight subpar quality work would maybe reduce the prestige of the price a little bit, but I think it’s a fairly junior price anyway that mostly encourages and rewards initial solid efforts. Also you still would have the higher tiers for especially good work which would lose little prestige.
I do think it’s possible that we might award more prizes retroactively if we recognize that we receive a lot of valuable submissions! Maybe an “honorable mentions” category.
Ah, I think my worry is that it feels difficult for me to find a standard to rate that actually tracks quality. If I give a couple of examples, people may feel limited to having their work look like those examples. I might say “make your distillation 1,000 words and explain two papers and I’ll give you a prize” but 1,500 words on one paper might have made an optimal submission and I would have limited people’s abilities. I think I find it hard to quantify a bar on writing since everyone has such different approaches. I think the real bar is something more like “the judges who know more about AI Safety than me believe that you have communicated this idea really well” and because of that it feels wrong for me to try to say “and if you do x you will definitely win something.”
If they already get a price, I wouldn’t call it “honorable mentions” because that unnecessarily diminishes it in my eyes. Just have anything that seems that would get at B- in school be in the same category as the 250$ price?
Ah, interesting, I have the opposite intuition!:D I completely agree that you shouldn’t give advice about the length of the distillations, but the criteria you mention here just seem really useful and like I’d be surprised if e.g. you find something clearly presented and accessible, and I wouldn’t.
And I feel like somebody who has spend like ~40 hours reading and discussing AI Safety material (e.g. as part AGI Safety Fundamentals course) could do a reasonably coherent job at rating the understanding and rigor. Originality seems maybe the trickiest, as you probably have to have some grasp of what ideas/framings are already in the water and which aren’t.
Really excited for this—I think distillation will be useful not only for checking the distiller’s understanding, but also in better communicating ideas around AI safety. Thanks for starting up this project!
Hi, would Anthropic’s research agenda be a good candidate for distilling?
Would the definition of “student enrolled in a university/college” include master’s students? I would normally think so but the ACX signal boost from today describes this as an “undergraduate” contest, so I wanted to double check.
The primary audience for this contests is undergrad, but Master’s students are allowed!
more distillations, yay! 🥳
could you organize this to also include people that aren’t enrolled students?
Thank you!
Could you clarify what you mean? Do you mean students who are on a break from college, newly admitted students who aren’t yet attending, or something else?
I am referring to people that chose alternative career paths to AI, autodidacts and independent ML researchers for example.
Unfortunately, I created this contest to help build up university groups, so I think keeping the contest limited to enrolled students (including students who are entering college later this year and students who will graduate before the contest ends) would be the best way to ensure that students feel like they have an advantage in the contest. Thank you for clarifying!
Thanks for the explanation.
other thoughts: Abram’s decision theory and Vanessa’s infrabayes work might be good for distillation. Also, might be worth thinking about some type of collab with current distillers, such as Robert Miles or Mark Xu, and the site distill.pub?
Oh, great idea! If nothing else, distill.pub is a great resource for me to list!
Thanks for pointing that out!
Distill was never really about distillations in the sense this post is referring to. It was a journal that focused on having very high-quality presentation/visualizations. It’s also no longer active: https://distill.pub/2021/distill-hiatus/
Are multiple submissions allowed?
Sure :)